logo
Leaderboards

Speech generation

Last updated: March 11, 2025

Our speech generation leaderboard evaluates AI models on their ability to generate high-quality speech from textual descriptions. We assess factors such as speech quality, word error rate and naturalness.

Rank
Model
Elo rating
TrueSkill rating
1
Eleven Labs
1254
1119
2
Open AI TTS
1119
1106
3
AWS Polly
1103
1044
4
Cartesia
1043
1050
5
Kokoro
968
899
6
Deepgram
865
927
7
Google TTS
811
807
8
XTTS-V2
694
797

What is `Elo rating`?

This is a dynamic rating system used in competitive games to rank players. In this context, it's applied to models. Higher Elo ratings indicate better performance based on head-to-head rankings. The Elo system adjusts ratings based on how well models perform against each other, and the K-factor (32) determines how much the rating changes after each match.

Human preference evaluation

Diverse pool of US-based Alignerrs, including generalists and creative artists

Consensus of three Alignerrs per task

Standardized instructions and ontology for consistent evaluations

Carefully curated prompt generation process, balancing creativity and clarity

Context Awareness

Pronunciation Accuracy

Prosody Accuracy

Open AI ...AWS Poll...CartesiaEleven L...XTTS-V2DeepgramKokoroGoogle T...

Description:

Assesses a text-to-speech model’s ability to understand contextual information throughout the text and adapt its output based on linguistic and situation context. Examples includes tone adjustment, emphasis & rhythm changes, and punctuation interpretation.

Options:

High

Medium

Low